ACE CLINICAL TRIAL

Author

Luboom Taa

Overview

The Angiotensin Converting Enzyme Inhibition, ACE(I), on Diabetic Nephropathy Trial was a prospective, double blinded, randomized, controlled clinical trial comparing the effects of captopril, an ACE inhibitor, against placebo in slowing the progression of renal disease in people with insulin dependent diabetes mellitus (IDDM).

Four hundred nine (409) individuals participated in the original study at 30 centers between December 1987 and October 1990 and the analysis data set for this assignment includes data from 350 participants. The primary end point of the trial was doubling of the baseline serum creatinine concentration. The main results of the ACE(I) clinical trial were published in the article ‘The Effect of Angiotensin-Converting-Enzyme Inhibition on Diabetic Nephropathy’ (NEJM. 329: 1456-1462. November 1993). Edmund Lewis, M.D. and Raymond P. Bain, Ph.D., PI and co-PI respectively, of the ACE(I) clinical trial released the original SAS data sets of the trial for purposes of data management and data analysis for Advanced Epidemiologic Data Analysis (PUBH_6260) course at George Washington University.

The original SAS data set has been modified for purpose of this assignment.

Serum creatinine is a measure of renal function with higher values indicating poorer kidney function. The use of captopril was intended to reduce the likelihood of doubling of serum creatinine during the study, indicating less progression of renal disease.

Methods

Data Importing and Inspection

The dataset used for this analysis was derived from the Angiotensin Converting Enzyme Inhibition (ACE[I]) trial, a double-blind, randomized controlled trial assessing the effect of captopril on renal disease progression in insulin-dependent diabetes mellitus (IDDM) patients The analysis dataset (\(ACE.csv\)) containing 350 observations and multiple demographic and clinical variables was imported into the statistical software R version 4.5.1 using standard library import procedures. Data inspection was conducted to confirm successful importation, check for missing values, ensure correct variable types (numeric or character), and verify coding consistency. Formats(eg.’TXGRP’ for treatment group) were applied to variables enhance interpretability.

Summary statistics and Research Questions

The primary research question examined whether treatment with captopril reduced the risk of doubling of serum creatinine compared to placebo. Secondary analyses explored (1) whether baseline mean arterial pressure differed between smokers and non-smokers, and (2) whether a correlation existed between baseline serum creatinine and age. Summary statistics were calculated for all variables using descriptive procedures, including means and standard deviations for continuous variables (e.g., BASEMAP, BASESCR, AGE) and frequency distributions for categorical variables (e.g., SEX, TXGRP, SMOKER). These summaries provided an overview of baseline characteristics and facilitated assessment of data balance between treatment groups.

Data Visualization and Inferential Analyses

Data visualization techniques were employed to describe and illustrate variable distributions and relationships. Histograms and boxplots depicted the distribution of continuous variables, while bar charts summarized categorical variable frequencies. For the primary outcome, a logistic regression was applied to compare the proportion of patients with doubled serum creatinine between the captopril and placebo groups. The secondary question on mean arterial pressure differences between smokers and non-smokers was evaluated using an independent samples t-test(if assumption met). The correlation between baseline serum creatinine and age was assessed using Pearson’s correlation coefficient. Statistical significance was defined at α = 0.05, and results were reported with corresponding confidence intervals.

Import Libraries

Below we import the necessary libraries for data manipulation, visualization, and statistical analysis.

library(readr)
library(psych)
library(dplyr)
library(ggplot2)
library(ggthemes)
library(plotly)
library(ggstatsplot)
library(vcd)
library(epitools)
library(lmtest)

Set working directory

setwd(getwd())

Importing the data

dfr <- readr::read_csv("Ace.csv")

Data Sanity Check

We note that our dataset has \(350\) Observations and \(9\) Variables.

dim(dfr)
[1] 350   9

We also note that we don’t have any missing data.

# Proportion of missing data
sum(is.na(dfr)) / (301 * 6)
[1] 0

Let us take a look at the structure of our dataset.

str(dfr)
spc_tbl_ [350 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ BGIRA  : num [1:350] 1 1 1 1 1 3 1 1 1 1 ...
 $ BASEMAP: num [1:350] 114 115 110 111 104 ...
 $ BPCAT  : num [1:350] 3 3 3 3 3 3 3 3 3 3 ...
 $ SEX    : chr [1:350] "FEMALE" "MALE" "FEMALE" "MALE" ...
 $ AGE    : num [1:350] 38.7 43.7 33.7 37.4 42.1 ...
 $ BASESCR: num [1:350] 1.35 1.55 2.25 1.73 1.38 ...
 $ TXGRP  : num [1:350] 2 1 2 2 2 2 2 1 1 2 ...
 $ DOUBLE : chr [1:350] "YES" "YES" "YES" "YES" ...
 $ SMOKER : chr [1:350] "YES" "YES" "YES" "NO" ...
 - attr(*, "spec")=
  .. cols(
  ..   BGIRA = col_double(),
  ..   BASEMAP = col_double(),
  ..   BPCAT = col_double(),
  ..   SEX = col_character(),
  ..   AGE = col_double(),
  ..   BASESCR = col_double(),
  ..   TXGRP = col_double(),
  ..   DOUBLE = col_character(),
  ..   SMOKER = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

The properties of the \(9\) variables are tabled below

Table 1: Description of variables in the ACE clinical trial dataset.
Variable Description Properties
\(\texttt{AGE}\) Age of participant years
\(\texttt{BGIRA}\) Race “1” = White “2” = Black and “3” “4” or “5” = Other
\(\texttt{SMOKER}\) Smoking status “YES” or “NO”
\(\texttt{SEX}\) SEX “MALE” or “FEMALE”
\(\texttt{BPCAT}\) Blood Pressure category at baseline visit 1 = Normal 2 = Borderline 3 = Hypertensive
\(\texttt{TXGRP}\) Treatment group 1 = Captopril or 2 = Placebo”
\(\texttt{BASESCR}\) Baseline serum creatinine mg/dL
\(\texttt{BASEMAP}\) Mean arterial pressure at the baseline visit mm Hg
\(\texttt{DOUBLE}\) Doubled Serum creatinine over course of the study “YES” or “NO”

Descriptive Statistics

Summary Statistics for Continuous Variables

The mean, median, and standard deviation for the measures of \(\texttt{AGE}\), mean arterial pressure at the baseline (\(\texttt{BASEMAP}\)), and serum creatinine at the baseline visit (\(\texttt{BASESCR}\)) overall and stratified by treatment group are summarized below.

# Setting labels for the treatment  variable levels
dfr$TXGRP <- factor(
  dfr$TXGRP,
  levels = c(1, 2), # The order matters
  labels = c("Captopril", "Placebo") # As per the encoding when the data was captured
)
# Summary statistics of age, BASEMAP and BASESCR by levels of treatment group 
psych::describeBy(
  dfr$AGE,
  group = dfr$TXGRP
)

 Descriptive statistics by group 
group: Captopril
   vars   n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
X1    1 173 34.96 7.31  35.07   34.85 7.74 20.54 48.97 28.44  0.1    -0.88 0.56
------------------------------------------------------------ 
group: Placebo
   vars   n  mean   sd median trimmed  mad   min max range skew kurtosis   se
X1    1 177 33.95 7.57  33.72   33.88 8.06 18.28  49 30.72 0.08    -0.82 0.57
psych::describeBy(
  dfr$BASEMAP,
  group = dfr$TXGRP
)

 Descriptive statistics by group 
group: Captopril
   vars   n   mean    sd median trimmed   mad min    max range skew kurtosis
X1    1 173 102.17 11.78    103  102.09 10.38  72 136.67 64.67 0.13     0.14
    se
X1 0.9
------------------------------------------------------------ 
group: Placebo
   vars   n   mean    sd median trimmed   mad min    max range skew kurtosis
X1    1 177 103.72 12.56 103.33  103.37 12.85  69 140.67 71.67 0.27     0.03
     se
X1 0.94
psych::describeBy(
  dfr$BASESCR,
  group = dfr$TXGRP
)

 Descriptive statistics by group 
group: Captopril
   vars   n mean   sd median trimmed  mad  min max range skew kurtosis   se
X1    1 173 1.27 0.42    1.2    1.23 0.44 0.58 2.3  1.72 0.69    -0.49 0.03
------------------------------------------------------------ 
group: Placebo
   vars   n mean   sd median trimmed  mad  min max range skew kurtosis   se
X1    1 177 1.26 0.41   1.18    1.24 0.41 0.55 2.5  1.95 0.58    -0.43 0.03

Summary Statistics for Categorical Variables

The frequency and proportion for the categorical variables \(\texttt{SEX}\), \(\texttt{SMOKER}\), \(\texttt{BPCAT}\), and \(\texttt{BGIRA}\) stratified by treatment group are summarized below.

xtabs(
  ~ SEX + TXGRP,
  data = dfr
)
        TXGRP
SEX      Captopril Placebo
  FEMALE        82      76
  MALE          91     101
xtabs(
  ~ SMOKER + TXGRP,
  data = dfr
)
      TXGRP
SMOKER Captopril Placebo
   NO        154      23
   YES        19     154
# Setting labels for the blood pressure variable levels
dfr$BPCAT <- factor(
  dfr$BPCAT,
  levels = c(1, 2, 3), # The order matters
  labels = c("Normal", "Borderline", "Hypertensive") # As per the encoding when the data was captured
)
xtabs(
  ~ BPCAT + TXGRP,
  data = dfr
)
              TXGRP
BPCAT          Captopril Placebo
  Normal              21      15
  Borderline          66      73
  Hypertensive        86      89
# Setting labels for the race variable levels
dfr$BGIRA <- factor(
  dfr$BGIRA,
  levels = c(1, 2, 3, 4, 5), # The order matters
  labels = c("White", "Black", "Other","Other","Other") # As per the encoding when the data was captured
)
xtabs(
  ~ BGIRA + TXGRP,
  data = dfr
)
       TXGRP
BGIRA   Captopril Placebo
  White       158     155
  Black         7       4
  Other         8      18

Data Visualization

Data Visualization for our continous Variables

Below are the boxplots and histograms for the continuous variables \(\texttt{AGE}\), \(\texttt{BASEMAP}\), and \(\texttt{BASESCR}\) stratified by treatment group.

The \(\texttt{age}\) and \(\texttt{TXGRP}\) are considered in Figure 1.

AGE_TXGRP <- (dfr %>% ggplot2::ggplot(
  aes(
    x = TXGRP,
    y = AGE
  )
) +
  ggplot2::geom_boxplot(
    aes(
      fill = TXGRP
    ),
    show.legend = FALSE
  ) +
  ggplot2::labs(
    title = "Distribution of AGE by Treatment Group",
    subtitle = "Comparison between Treatment groups"
  ) +
  ggplot2::scale_fill_brewer(
    palette = "Set1",
    direction = -1
  ) +
  ggplot2::xlab("Treatment Groups") +
  ggplot2::ylab("Age") +
  ggthemes::theme_clean());
plotly::ggplotly(AGE_TXGRP)
Figure 1: Age distribution by treatment group

The \(\texttt{BASEMAP}\) and \(\texttt{TXGRP}\) are considered in Figure 2.

BASEMAP_TXGRP <- (dfr %>% ggplot2::ggplot(
  aes(
    x = TXGRP,
    y = BASEMAP
  )
) +
  ggplot2::geom_boxplot(
    aes(
      fill = TXGRP
    ),
    show.legend = FALSE
  ) +
  ggplot2::labs(
    title = "Distribution of BASEMAP by Treatment Group",
    subtitle = "Comparison between Treatment groups"
  ) +
  ggplot2::scale_fill_brewer(
    palette = "Set1",
    direction = -1
  ) +
  ggplot2::xlab("TXGRP") +
  ggplot2::ylab("BASEMAP") +
  ggthemes::theme_clean());
plotly::ggplotly(BASEMAP_TXGRP)
Figure 2

The \(\texttt{BASESCR}\) and \(\texttt{TXGRP}\) are considered in Figure 3.

BASESCR_TXGRP <- (dfr %>% ggplot2::ggplot(
  aes(
    x = TXGRP,
    y = BASESCR
  )
) +
  ggplot2::geom_boxplot(
    aes(
      fill = TXGRP
    ),
    show.legend = FALSE
  ) +
  ggplot2::labs(
    title = "Distribution of BASESCR by Treatment Group",
    subtitle = "Comparison between Treatment groups"
  ) +
  ggplot2::scale_fill_brewer(
    palette = "Set1",
    direction = -1
  ) +
  ggplot2::xlab("TXGRP") +
  ggplot2::ylab("BASESCR") +
  ggthemes::theme_clean());
plotly::ggplotly(BASESCR_TXGRP)
Figure 3

A scatterplot that illustrates the relationship between \(\texttt{AGE}\) and \(\texttt{BASEMAP}\) is shown in Figure 4.

ggstatsplot::ggscatterstats(
  data = dfr,
  x = AGE,
  y = BASEMAP,
  bf.message = FALSE
)
Figure 4: AGE vs. BASEMAP

Data Visualization for our Categorical Variables

The mosaic plot on Figure 5 illustrates the relationship between \(\texttt{TXGRP}\) and \(\texttt{SEX}\) is shown.

vcd::mosaic(
  ~ SEX + TXGRP,
  data = dfr,
  highlighting = "TXGRP",
  highlighting_fill = c("deepskyblue", "yellow"),
  main = "Mosaic plot of SEX and TXGRP",
  labeling_args = list(gp_labels = gpar(fontsize = 12)
)) 
Figure 5: Sex vs. TXGRP